Elasticsearch篇之入门
常用术语
文档
Document
用户存储在es中的数据文档.(相当于数据库表中的一行)
索引
Index
由具有相同字段的文档列表组成.(相当于数据库中的表,
es6.0
以后index
下的type
只能有一个, 且官方声明以后会取消掉type
这个概念)节点
Node
一个
Elasticsearch
的运行实例, 是集群的构成单元.集群
Cluster
由一个或多个节点组成, 对外提供服务.
文档 Document
Document
在es
中是一个 Json Object, 由字段(Field)组成, 常见数据类型如下:- 字符串:
text
(进行分词的字符串),keyword
(不进行分词的字符串) - 数值型:
long
,integer
,short
,byte
,double
,float
,half_float
,scaled_float
- 布尔:
boolean
- 日期:
date
- 二进制:
binary
- 范围类型:
integer_range
,float_range
,long_range
,double_range
,date_range
- 字符串:
- 每个文档有唯一的id标识
- 可以自行指定
- 也可以es自动生成
- 如下所示是一条Nginx日志在ES储存为一条文档(
Document
), ES对其日志信息进行结构化处理, 包含多个字段(Field
), 每个字段的字段名(Field Name
)对应一个字段值(Field Value
)
文档元数据 Document MetaData
- 每个Document都有一个文档元数据(Document MetaData), 用于标注文档的相关信息
_index
: 文档所在的索引名_type
: 文档所在的类型名_id
: 文章唯一id_uid
: 组合id, 由_type
和_id
组成(6.x中_type
不再起作用, 所以在6.x版本中这个字段值和_id
一样)_source
: 文档的原始Json数据, 可以从这里获取每个字段的内容_all
: 整合所有的字段内容到该字段, 默认禁用(官方不推荐使用)
索引 Index
- 索引中存储具有相同结构的文档(
Document
)- 每个索引都有自己的mapping定义, 用于定义字段名和类型
- 一个集群可以有多个索引, 比如:
- nginx日志存储的时候可以按日期每天生成一个索引来存储, 方便维护
- nginx-log-2020-04-03
- nginx-log-2020-04-04
- nginx-log-2020-04-05
- nginx日志存储的时候可以按日期每天生成一个索引来存储, 方便维护
Rest API
- Elasticsearch集群对外提供RESTful API
- REST: REpresentational State Transfer (表述性状态转移)
- URI指定资源, 如Index, Document等
- Http Method指明资源操作类型, 如GET, POST, PUT, DELETE等
- 常用两种交互方式
- Curl命令行
- Kibana DevTools
索引 Index API
es有专门的Index API, 用于创建, 更新, 删除索引配置等
创建索引
PUT /{索引名}
示例:
# request PUT /test_index # response { "acknowledged": true, "shards_acknowledged": true, "index": "test_index" }
查看现有索引
GET /_cat/indices
示例:
# request GET /_cat/indices # response red open account eIBKm9zfQhOZkW6tC1uyEA 5 1 1 0 5.4kb 5.4kb yellow open test_index XTzQRFtzRqK3B3EfULLrEg 5 1 0 0 1.1kb 1.1kb
删除索引
DELETE /{索引名}
示例:
# request DELETE /test_index # response { "acknowledged": true }
文档 Document API
es有专门的Document API
创建文档 (创建文档时, 如果索引不存在, es会自动创建对应的index和type)
指定id创建文档
# 其中类型名在6.x以后无实际作用, 并且将来版本要删除, 在这里可以任意指定, 一般指定无意义的doc PUT /{索引名}/{类型名}/{Id} { # 文档内容 }
示例:
# request PUT /test_index/doc/1 { "username": "Jiavg", "age": 21 } # response # _version是为了在并行修改文档时, 防止发生错误 { "_index": "test_index", "_type": "doc", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 1 }
- 不指定id创建文档
```
POST /test_index/doc
{
# 文档内容
}
```
示例:
```shell
# request
POST /test_index/doc
{
"username": "jlc",
"age": 20
}
# response
# 由于未指定id, es将会生成一个id
{
"_index": "test_index",
"_type": "doc",
"_id": "QzXQT3EBkfca6l6Y9SXp",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
```
查询文档
指定要查询的文档id
GET /{索引名}/{类型名}/{id}
示例:
# request GET /test_index/doc/1 # response # _source 储存了文档的原始数据 # 200 response { "_index": "test_index", "_type": "doc", "_id": "1", "_version": 1, "found": true, "_source": { "username": "Jiavg", "age": 21 } } # 404 response { "_index": "test_index", "_type": "doc", "_id": "2", "found": false }
- 搜索所有文档, 用到_search
```
# 不含查询条件 (查询所有文档)
GET /{索引名}/{文档名}/_search
# 包含查询条件 (查询符合条件的所有文档)
GET /{索引名}/{文档名}/_search
{
# 查询条件
}
```
示例:
```shell
# 不含查询条件 (查询所有文档)
# request
GET /test_index/doc/_search
# response
# took: 查询花费时间, 单位ms
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2, # 符合条件的总文档数
"max_score": 1,
"hits": [ # 返回的文档详情数据数组, 默认前10个文档
{
"_index": "test_index",
"_type": "doc",
"_id": "QzXQT3EBkfca6l6Y9SXp",
"_score": 1, # 文档的得分
"_source": {
"username": "jlc",
"age": 20
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"username": "Jiavg",
"age": 21
}
}
]
}
}
# 包含查询条件 (查询符合条件的所有文档)
# request
GET /test_index/doc/_search
{
"query": {
"term": {
"_id": 1
}
}
}
# response
{
"took": 23,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"username": "Jiavg",
"age": 21
}
}
]
}
}
```
更新文档
POST /{索引名}/{类型名}/{id} { # 更新文档内容 }
删除文档
DELETE /{索引名}/{类型名}/{id}
es允许一次创建多个文档, 从而减少网络传输开销, 提升写入速率
endpoint 为
_bulk
, 如下:index
和create
同为创建文档, 不同的是index
在创建文档时, 如果文档id已经存在, 则会覆盖相应的内容, 但是create
在创建文档时,如果文档id已经存在, 则会报错。- 请求
响应
注意: 在使用
_bulk
时,REST API端点为/ _bulk,并且期望使用以下以换行符分隔的JSON(NDJSON)结构:action_and_meta_data\n optional_source\n action_and_meta_data\n optional_source\n .... action_and_meta_data\n optional_source\n
NDJSON: ndjson(New-line Delimited JSON)是一个比较新的标准,本身超简单,就是一个.ndjson文件中,每行都是一个传统json对象,当然每个json对象中要去掉原本用于格式化的换行符,而json的string中本身就不允许出现换行符(取而代之的是\n).
所以当请求的数据为普通Json时会发生错误.
示例:
# NDJSON # request POST _bulk {"index":{"_index":"test_index","_type":"doc","_id":1}} {"username":"Jiavg-1","age":5} {"update":{"_index":"test_index","_type":"doc","_id":"QzXQT3EBkfca6l6Y9SXp"}} {"doc":{"age":25}} {"create":{"_index":"test_index","_type":"doc","_id":3}} {"username":"znc","age":22} # response { "took": 52, "errors": false, "items": [ { "index": { "_index": "test_index", "_type": "doc", "_id": "1", "_version": 3, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 2, "_primary_term": 2, "status": 200 } }, { "update": { "_index": "test_index", "_type": "doc", "_id": "QzXQT3EBkfca6l6Y9SXp", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 1, "_primary_term": 2, "status": 200 } }, { "create": { "_index": "test_index", "_type": "doc", "_id": "3", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "_seq_no": 0, "_primary_term": 2, "status": 201 } } ] }
# 普通json
# request
POST _bulk
{
"index": {
"_index": "test_index",
"_type": "doc",
"_id": 1
}
}
{
"username": "Jiavg-1",
"age": 5
}
{
"update": {
"_index": "test_index",
"_type": "doc",
"_id": "QzXQT3EBkfca6l6Y9SXp"
}
}
{
"doc": {
"age": 25
}
}
{
"create": {
"_index": "test_index",
"_type": "doc",
"_id": 3
}
}
{
"username": "znc",
"age": 22
}
# response
{
"error": {
"root_cause": [
{
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 3]"
}
],
"type": "json_e_o_f_exception",
"reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@618ff58; line: 1, column: 3]"
},
"status": 500
}
```
json和ndjson区别参考: https://blog.csdn.net/github_38885296/article/details/100915601
es允许一次查询多个文档
endpoint为
_mget
, 如下: